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Abstract 

Can Grover's algorithm speed up search of a physical region — for example a 2-D grid of size \fn x ^Jnl 
The problem is that \fn time seems to be needed for each query, just to move amplitude across the grid. 
Here we show that this problem can be surmounted, refuting a claim to the contrary by Benioff. In 
particular, we show how to search a d-dimensional hypercube in time 0{-\fn) for d > 3, or 0{*<fn log 5 ' 2 n) 
for d = 2. More generally, we introduce a model of quantum query complexity on graphs, motivated by 
fundamental physical limits on information storage, particularly the holographic principle from black 
hole thermodynamics. Our results in this model include almost-tight upper and lower bounds for many 
search tasks; a generalized algorithm that works for any graph with good expansion properties, not just 
hypercubes; and relationships among several notions of 'locality' for unitary matrices acting on graphs. 
As an application of our results, we give an 0(y / n)-qubit communication protocol for the disjointness 
problem, which improves an upper bound of H0yer and de Wolf and matches a lower bound of Razborov. 

1 Introduction 

The goal of Grover's quantum search algorithm |17l IT%| is to search an 'unsorted database' of size n in a 
number of queries proportional to ^fn. Classically, of course, order n queries are needed. It is sometimes 
asserted that, although the speedup of Grover's algorithm is only quadratic, this speedup is provable, in 
contrast to the exponential speedup of Shor's factoring algorithm But is that really true? Grover's 

algorithm is typically imagined as speeding up combinatorial search — and we do not know whether every 
problem in N P can be classically solved quadratically faster than the "obvious" way, any more than we know 
whether factoring is in BPP. 

But could Grover's algorithm speed up search of a physical region? Here the basic problem, it seems to 
us, is the time needed for signals to travel across the region. For if we are interested in the fundamental 
limits imposed by physics, then we should acknowledge that the speed of light is finite, and that a bounded 
region of space can store only a finite amount of information, according to the holographic principle 9 . We 
discuss the latter constraint in detail in Section [3 for now, we say only that it suggests a model in which 
a 'quantum robot' occupies a superposition over finitely many locations, and moving the robot from one 
location to an adjacent one takes unit time. In such a model, the time needed to search a region could 
depend critically on its spatial layout. For example, if the n entries are arranged on a line, then even to 
move the robot from one end to the other takes n — 1 steps. But what if the entries are arranged on, say, a 
2-dimensional square grid (Figure QJ? 



1.1 Summary of Results 

This paper gives the first systematic treatment of quantum search of spatial regions, with 'regions' modeled 
as connected graphs. Our main result is positive: we show that a quantum robot can search a (/-dimensional 
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Figure 1: A quantum robot, in a superposition over locations, searching for a marked item on a 2D grid of 
size \fn x \fn. 
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Table 1: Upper and lower bounds for quantum search on a <i-dimensional graph given in this paper. The 
symbol means that the upper bound includes a polylogarithmic term. Note that, if d = 2, then Q (^/n) 
is always a lower bound, for any number of marked items. 

hypercube with n vertices for a unique marked vertex in time O (^/nlog 3 ^ 2 nj when d = 2, or O (y/n) 

when d > 3. This matches (or in the case of 2 dimensions, nearly matches) the fl (\/n) lower bound for 
quantum search, and supports the view that Grover search of a physical region presents no problem of 
principle. Our basic technique is divide-and-conquer; indeed, once the idea is pointed out, an upper bound 
of O (n 1 / 2+6 ^j follows readily. However, to obtain the tighter bounds is more difficult; for that we use the 
amplitude-amplification framework of Grover ^1 an d Brassard et al. 

Section presents the main results; Section 15.41 shows further that, when there are k or more marked 

vertices, the search time becomes O (^/nlog 5 ^ 2 nj when d = 2, or 9 (y / n/fc 1 / 2 ~ 1 / d ) when d > 3. Also, 
Section [B] generalizes our algorithm to arbitrary graphs that have 'hypercube-like' expansion properties. 
Here the best bounds we can achieve are y/n2°( y,l ° sn ) when d — 2, or O (v^polylogn) when d > 2 (note 
that d need not be an integer). Table [1~T1 summarizes the results. 

Section[7]shows, as an unexpected application of our search algorithm, that the quantum communication 
complexity of the well-known disjointness problem is O (s/n)- This improves an O [y^ric log ™) upper bound 
of H0yer and de Wolf |2()| . and matches the £1 (\/n) lower bound of Razborov |23j . 

The rest of the paper is about the formal model that underlies our results. Section sets the stage for 
this model, by exploring the ultimate limits on information storage imposed by properties of space and time. 
This discussion serves only to motivate our results; thus, it can be safely skipped by readers unconcerned 
with the physical universe. In Section we define quantum query algorithms on graphs, a model similar 
to quantum query algorithms as defined by Beals et al. 0], but with the added requirement that unitary 
operations be 'local' with respect to some graph. In Section l3~T1 we address the difficult question, which 
also arises in work on quantum random walks 1 and quantum cellular automata 31 , of what 'local' means. 

Section 0] proves general facts about our model, including an upper bound of O (^/nSj for the time needed 

to search any graph with diameter 8, and a proof (using the hybrid argument of Bennett et al. 0) that this 
upper bound is tight for certain graphs. We conclude in Section |H1 with some open problems. 
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Table 2: Time needed to find a unique marked item in a o?-dimensional hypercube, using the divide- and- 
conquer algorithms of this paper, the original quantum walk algorithm of Childs and Goldstone and 
the improved walk algorithms of Ambainis, Kempe, and Rivosh P] and Childs and Goldstone |15| . 

1.2 Related Work 

In a paper on 'Space searches with a quantum robot,' Benioff jS] asked whether Grover's algorithm can speed 
up search of a physical region, as opposed to a combinatorial search space. His answer was discouraging: 
for a 2-D grid of size yfn x y/n, Grover's algorithm is no faster than classical search. The reason is that, 
during each of the O (y/ri) Grover iterations, the algorithm must use order yfn steps just to travel across 
the grid and return to its starting point for the diffusion step. On the other hand, Benioff noted, Grover's 
algorithm does yield some speedup for grids of dimension 3 or higher, since those grids have diameter less 
than y/n. 

Our results show that Benioff 's claim is mistaken: by using Grover's algorithm more carefully, one can 

search a 2-D grid for a single marked vertex in O (^y/nlog 3 ^ 2 nj time. To us this illustrates why one should 

not assume an algorithm is optimal on heuristic grounds. Painful experience — for example, the "obviously 
optimal" O (n 3 ) matrix multiplication algorithm |3()j — is what taught computer scientists to see the proving 
of lower bounds as more than a formality. 

Our setting is related to that of quantum random walks on graphs [Tl 1131 flTl I28| . In an earlier version 
of this paper, we asked whether quantum walks might yield an alternative spatial search algorithm, possibly 
even one that outperforms our divide-and-conquer algorithm. Motivated by this question, Childs and 
Goldstone managed to show that in the continuous-time setting, a quantum walk can search a d- 
dimensional hypercube for a single marked vertex in time O (y/nlogn) when d = 4, or O (y/n) when d > 5. 
Our algorithm was still faster in 3 or fewer dimensions (see Table H~2*|) . Subsequently, however, Ambainis, 
Kempe, and Rivosh |2j gave an algorithm based on a discrete-time quantum walk, which was as fast as ours 
in 3 or more dimensions, and faster in 2 dimensions. In particular, when d = 2 their algorithm used only 
O (y/n\ogn) time to find a unique marked vertex. Childs and Goldstone ^H] then gave a continuous-time 
quantum walk algorithm with the same performance, and related this algorithm to properties of the Dirac 
equation. It is still open whether O (y/n) time is achievable in 2 dimensions. 

Currently, the main drawback of the quantum walk approach is that all analyses have relied heavily on 
symmetries in the underlying graph. If even minor 'defects' are introduced, it is no longer known how to 
upper-bound the running time. By contrast, the analysis of our divide-and-conquer algorithm is elementary, 
and does not depend on eigenvalue bounds. We can therefore show that the algorithm works for any graphs 
with sufficiently good expansion properties. 

Childs and Goldstone |16j argued that the quantum walk approach has the advantage of requiring fewer 
auxiliary qubits than the divide-and-conquer approach. However, the need for many qubits was an artifact 
of how we implemented the algorithm in a previous version of the paper. The current version uses only one 
qubit. 

2 The Physics of Databases 

Theoretical computer science generally deals with the limit as some resource (such as time or memory) 
increases to infinity. What is not always appreciated is that, as the resource bound increases, physical 
constraints may come into play that were negligible at 'sub-asymptotic' scales. We believe theoretical 
computer scientists ought to know something about such constraints, and to account for them when possible. 
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For if the constraints are ignored on the ground that they "never matter in practice," then the obvious 
question arises: why use asymptotic analysis in the first place, rather than restricting attention to those 
instance sizes that occur in practice? 

A constraint of particular interest for us is the holographic principle jS], which arose from black-hole 
thermodynamics. The principle states that the information content of any spatial region is upper-bounded 
by its surface area (not volume), at a rate of one bit per Planck area, or about 1.4 x 10 69 bits per square 
meter. Intuitively, if one tried to build a spherical hard disk with mass density v, one could not keep 
expanding it forever. For as soon as the radius reached the Schwarzschild bound of r = y3/ (87ru) (in 
Planck units, c = G = ti = k = 1), the hard disk would collapse to form a black hole, and thus its contents 
would be irretrievable. 

Actually the situation is worse than that: even a planar hard disk of constant mass density would collapse 
to form a black hole once its radius became sufficiently large, r = (1/v). (We assume here that the hard 
disk is disc-shaped. A linear or 1-D hard disk could expand indefinitely without collapse.) It is possible, 
though, that a hard disk's information content could asymptotically exceed its mass. For example, a black 
hole's mass is proportional to the radius of its event horizon, but the entropy is proportional to the square 
of the radius (that is, to the surface area). Admittedly, inherent difficulties with storage and retrieval make 
a black hole horizon less than ideal as a hard disk. However, even a weakly-gravitating system could store 
information at a rate asymptotically exceeding its mass-energy. For instance, Bousso [9] shows that an 
enclosed ball of radiation with radius r can store n — 9 (r 3 / 2 ) bits, even though its energy grows only as 
7'. Our results in Section 16. II will imply that a quantum robot could (in principle!) search such a 'radiation 
disk' for a marked item in time O (r 5 / 4 ) = O (n 5 / 6 ). This is some improvement over the trivial O (n) upper 
bound for a 1-D hard disk, though it falls short of the desired O (\/n). 

In general, if n = r c bits are scattered throughout a 3-D ball of radius r (where c < 3 and the bits' 
locations are known), we will show in Theorem 1301 that the time needed to search for a '1' bit grows as 
^i/c+i/6 _ r i+c/6 (o m itting logarithmic factors). In particular, if n = (r 2 ) (saturating the holographic 
bound), then the time grows as n 2 / 3 or r 4 / 3 . To achieve a search time of O (^/npolylog n), the bits would 
need to be concentrated on a 2-D surface. 

Because of the holographic principle, we see that it is not only quantum mechanics that yields a (^/n) 
lower bound on the number of steps needed for unordered search. If the items to be searched are laid out 
spatially, then general relativity in 3 + 1 dimensions independently yields the same bound, f2 (v^); U P to a 
constant factor. 1 Interestingly, in d + 1 dimensions the relativity bound would be Q (n 1 /^ -1 )), which for 
d > 3 is weaker than the quantum mechanics bound. Given that our two fundamental theories yield the 
same lower bound, it is natural to ask whether that bound is tight. The answer seems to be that it is not 
tight, since (i) the entropy on a black hole horizon is not efficiently accessible 2 , and (ii) weakly-gravitating 
systems are subject to the Bekenstein bound [S], an even stronger entropy constraint than the holographic 
bound. 

Yet it is still of basic interest to know whether n bits in a radius-r ball can be searched in time 
o (min {n, ry/n}) — that is, whether it is possible to do anything better than either brute-force quantum 
search (with the drawback pointed out by Benioff [Hj), or classical search. Our results show that it is 
possible. 

From a physical point of view, several questions naturally arise: (1) whether our complexity measure is 
realistic; (2) how to account for time dilation; and (3) whether given the number of bits we are imagining, 
cosmological bounds are also relevant. Let us address these questions in turn. 

(1) One could argue that to maintain a 'quantum database' of size n requires n computing elements ([6'2\. 
though see also P3]). So why not just exploit those elements to search the database in parallel! 
Then it becomes trivial to show that the search time is limited only by the radius of the database, so 

1 Admittedly, the holographic principle is part of quantum gravity and not general relativity per se. All that matters for us, 
though, is that the principle seems logically independent of quantum-mechanical linearity, which is what produces the "other" 
Q (y^n) bound. 

2 In the case of a black hole horizon, waiting for the bits to be emitted as Hawking radiation — as recent evidence suggests 
that they are 1271 — takes time proportional to r 3 , which is much too long. 
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the algorithms of this paper are unnecessary. Our response is that, while there might be n 'passive' 
computing elements (capable of storing data), there might be many fewer 'active' elements, which 
we consequently wish to place in a superposition over locations. This assumption seems physically 
unobjectionable. For a particle (and indeed any object) really does have an indeterminate location, 
not merely an indeterminate internal state (such as spin) at some location. We leave as an open 
problem, however, whether our assumption is valid for specific quantum computer architectures such 
as ion traps. 

(2) So long as we invoke general relativity, should we not also consider the effects of time dilation? Those 
effects are indeed pronounced near a black hole horizon. Again, though, for our upper bounds we will 
have in mind systems far from the Schwarzschild limit, for which any time dilation is by at most a 
constant factor independent of n. 

(3) How do cosmological considerations affect our analysis? Bousso |H] argues that, in a spacetime with 
positive cosmological constant A > 0, the total number of bits accessible to any one experiment is at 
most 3"7r/ (A In 2), or roughly 10 122 given current experimental bounds 23 on A. 3 Intuitively, even if 
the universe is spatially infinite, most of it recedes too quickly from any one observer to be harnessed 
as computer memory. 

One response to this result is to assume an idealization in which A vanishes, although Planck's constant 
h does not vanish. As justification, one could argue that without the idealization A = 0, all asymp- 
totic bounds in computer science are basically fictions. But perhaps a better response is to accept the 
37r/ (A In 2) bound, and then ask how close one can come to saturating it in different scenarios. Clas- 
sically, the maximum number of bits that can be searched is, in a crude model 4 , actually proportional 
to l/VA ps 10 61 rather than 1/A. The reason is that if a region had much more than 1/VX bits, then 
after l/VA Planck times — that is, about 10 10 years, or roughly the current age of the universe — most 
of the region would have receded beyond one's cosmological horizon. What our results suggest is 
that, using a quantum robot, one could come closer to saturating the cosmological bound — since, for 

example, a 2-D region of size 1/A can be searched in time O polylog ^j=^ . How anyone could 

prepare a database of size much greater than 1/y/A remains unclear, but if such a database existed, it 
could be searched! 

3 The Model 

Much of what is known about the power of quantum computing comes from the black-box or query model 
[21Ell7i rr71l29| . in which one counts only the number of queries to an oracle, not the number of computational 
steps. We will take this model as the starting point for a formal definition of quantum robots. Doing so 
will focus attention on our main concern: how much harder is it to evaluate a function when its inputs are 
spatially separated? As it turns out, all of our algorithms will be efficient as measured by the number of 
gates and auxiliary qubits needed to implement them. 

For simplicity, we assume that a robot's goal is to evaluate a Boolean function / : {0, 1}" — + {0, 1}, which 
could be partial or total. A 'region of space' is a connected undirected graph G = (V, E) with vertices 
V = . . . , v n }. Let X = X\ . . .x n € {0, l} n be an input to /; then each bit X{ is available only at vertex 
I?,. We assume the robot knows G and the vertex labels in advance, and so is ignorant only of the Xi bits. 
We thus sidestep a major difficulty for quantum walks pQ, which is how to ensure that a process on an 
unknown graph is unitary. 

3 Also, Lloyd 1211 argues that the total number of bits accessible up till now is at most the square of the number of Planck 
times elapsed so far, or about (10 61 ) 2 = 10 122 . Lloyd's bound, unlike Bousso's, does not depend on A being positive. The 
numerical coincidence between the two bounds reflects the experimental finding 12(511251 that we live in a transitional era, when 
both A and "dust" contribute significantly to the universe's net energy balance (Ca ~ 0.7, f2d us t ~ 0.3). In earlier times dust 
(and before that radiation) dominated, and Lloyd's bound was tighter. In later times A will dominate, and Bousso's bound 
will be tighter. Why we should live in such a transitional era is unknown. 

4 Specifically, neglecting gravity and other forces that could counteract the effect of A. 
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At any time, the robot's state has the form 



\vi,z) 



Here m £ V is a vertex, representing the robot's location; and z is a bit string (which can be arbitrarily 
long), representing the robot's internal configuration. The state evolves via an alternating sequence of T 
algorithm steps and T oracle steps: 



An oracle step O^' maps each basis state \vt,z) to | ), where Xi is exclusive-OR'ed into the first 

bit of z. An algorithm step f/w can be any unitary matrix that (1) does not depend on X, and (2) acts 
'locally' on G. How to make the second condition precise is the subject of Section 1X11 

The initial state of the algorithm is |«i,0). Let af^ (X) be the amplitude of \vi,z) immediately after 
the t th oracle step; then the algorithm succeeds with probability 1 — e if 



E 

\Vi,x) : z uT=f(X) 



a 



(T) 



(X) 



> 1 



for all inputs X, where zqut is a bit of z representing the output. 



3.1 Locality Criteria 

Classically, it is easy to decide whether a stochastic matrix acts locally with respect to a graph G: it does 
if it moves probability only along the edges of G. In the quantum case, however, interference makes the 
question much more subtle. In this section we propose three criteria for whether a unitary matrix U is local. 
Our algorithms will then be implemented using the most restrictive of these criteria. 

The first criterion we call Z-locality (for zero): U is Z-local if, given any pair of non-neighboring vertices 
Vi,V2 in G, U "sends no amplitude" from v\ to V2\ that is, the corresponding entries in U are all 0. The 
second criterion, C-locality (for composability) , says that this is not enough: not only must U send amplitude 
only between neighboring vertices, but it must be composed of a product of commuting unitaries, each of 
which acts on a single edge. The third criterion is perhaps the most natural one to a physicist: U is H-local 
(for Hamiltonian) if it can be obtained by applying a locally-acting, low-energy Hamiltonian for some fixed 
amount of time. More formally, let Ui tZ ^i* jZ * be the entry in the \v%, z) column and , z*) row of U . 

Definition 1 U is Z-local if "Ui, z -+i» ,z* = whenever i ^ i* and (vi,Vi*) is not an edge of G. 

Definition 2 U is C-local if the basis states can be partitioned into subsets Pi , . . . , P q such that 

(i) Ui :Z —>i* t z* — whenever \vi,z) and \vi*,z*) belong to distinct Pj 's, and 

(ii) for each j, all basis states in Pj are either from the same vertex or from two adjacent vertices. 

Definition 3 U is H-local if U — e lH for some Hermitian H with eigenvalues of absolute value at most it, 
such that Hi fZ ^i* tZ * — whenever i ^ i* and is not an edge in E. 

If a unitary matrix is C-local, then it is also Z-local and H-local. For the latter implication, note that 
any unitary U can be written as e lH for some H with eigenvalues of absolute value at most n. So we can 
write the unitary Uj acting on each Pj as e lHj ; then since the Uj's commute, 

l[Uj=e^"*. 

Beyond that, though, how are the locality criteria related? Are they approximately equivalent? If not, 
then does a problem's complexity in our model ever depend on which criterion is chosen? Let us emphasize 



G 



that these questions are not answered by, for example, the Solovay-Kitaev theorem (see that an n x n 

unitary matrix can be approximated using a number of gates polynomial in n. For recall that the definition 
of C-locality requires the edgewise operations to commute — indeed, without that requirement, one could 
produce any unitary matrix at all. So the relevant question, which we leave open, is whether any Z-local or 
H-local unitary can be approximated by a product of, say, O (logn) C-local unitaries. (A product of O (n) 
such unitaries trivially suffices, but that is far too many.) 

4 General Bounds 

Given a Boolean function / : {0, 1}" — * {0, 1}, the quantum query complexity Q (/), defined by Beals et al. 
[2], is the minimum T for which there exists a T-query quantum algorithm that evaluates / with probability 
at least 2/3 on all inputs. (We will always be interested in the two-sided, bounded-error complexity, 
sometimes denoted Q2 (/)■) Similarly, given a graph G with n vertices labeled 1, . . . , n, we let Q (/, G) be 
the minimum T for which there exists a T-query quantum robot on G that evaluates / with probability 2/3. 
Here we require the algorithm steps to be C-local. One might also consider the corresponding measures 
Q z (/, G) and Q H (/, G) with Z-local and H-local steps respectively. Clearly Q (/, G) > Q z (/, G) and 
Q (/j G) > Q H (f, G); we conjecture that all three measures are asymptotically equivalent but were unable 
to prove this. 

Let 5g be the diameter of G, and call / nondegenerate if it depends on all n input bits. 
Proposition 4 For all f, G, 

(i) Q(f,G)<2n-3. 

(ii) Q (/, G) < (2S G + l)Q(f). 
(Hi) Q(f,G)>Q(f). 

(iv) Q(f,G) > <5g/2 if f is nondegenerate. 
Proof. 

(i) Starting from the root, a spanning tree for G can be traversed in 2 (n — 1) — 1 steps (there is no need 
to return to the root). 

(ii) We can simulate a query in 2Sq steps, by fanning out from the start vertex v% and then returning. 
Applying a unitary at v± takes 1 step. 

(iii) Obvious. 

(iv) There exists a vertex Vi whose distance to v\ is at least Sq/2, and / could depend on Xi. 
■ 

We now show that the model is robust. 
Proposition 5 For nondegenerate f , the following change Q (/, G) by at most a constant factor. 

(i) Replacing the initial state |i>i,0) by an arbitrary (known) 

(ii) Requiring the final state to be localized at some vertex v.^ with probability at least 1 — e, for a constant 
e > 0. 

(iii) Allowing multiple algorithm steps between each oracle step (and measuring the complexity by the number 
of algorithm steps). 

Proof. 
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(i) We can transform |ui,0) to \ip) (and hence to |«i,0)) in So = (Q (/, G)) steps, by fanning out 
from v\ along the edges of a minimum-height spanning tree. 

(ii) Assume without loss of generality that zout is accessed only once, to write the output. Then after 
zout is accessed, uncompute (that is, run the algorithm backwards) to localize the final state at v\. 
The state can then be localized at any in 8q = O (Q (/, G)) steps. We can succeed with any constant 
probability by repeating this procedure a constant number of times. 

(iii) The oracle step O is its own inverse, so we can implement a sequence U\, U%, . . . of algorithm steps as 
follows (where I is the identity): 



A function of particular interest is / = OR (x\, . . . , x n ), which outputs 1 if and only if x% — 1 for some i. 
We first give a general upper bound on Q (OR, G) in terms of the diameter of G. (Throughout the paper, 
we sometimes omit floor and ceiling signs if they clearly have no effect on the asymptotics.) 

Proposition 6 

Q(OR, G) = (\Zn5~G 



Proof. Let r be a minimum-height spanning tree for G, rooted at V\. A depth-first search on r uses 
2n — 2 steps. Let S\ be the set of vertices visited by depth-first search in steps 1 to 8g, S 2 be those visited 
in steps 8g + 1 to 28g, and so on. Then 

SiU-U S 2n/SG = V. 

Furthermore, for each Sj there is a classical algorithm Aj, using at most 3Sg steps, that starts at vi, ends 
at Vx, and outputs '1' if and only if = 1 for some G Sj. Then we simply perform Grover search at 

vi over all Aj; since each iteration takes O (8g) steps and there are O (^\/ 2n / 8cj iterations, the number of 

steps is O (y/nSc) ■ ■ 

The bound of Proposition [fj] is tight: 

Theorem 7 For all 8, there exists a graph G with diameter 8q = 8 such that 

Q (OR, G) = Q 



Proof. Let G be a 'starfish' with central vertex v\ and M = 2 (n — 1) /8 legs L±, . . . , Lm, each of length 
8/2 (see Figure^. We use the hybrid argument of Bennett et al. 0. Suppose we run the algorithm on the 
all-zero input Xq. Then define the query magnitude Ty to be the probability of finding the robot in leg Lj 
immediately after the t th query: 

r r= E EhS(*o)| 2 . 

ViELj z 

Let T be the total number of queries, and let w — T j (cS) for some constant < c < 1/2. Clearly 

w-l M w-1 

££rf—><£i = „ 

9=0 j=l 9=0 

Hence there must exist a leg Lj* such that 

w — 1 



(T-qcS) < W_ 



M 2(n-l) 

q=0 V ' 
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Figure 2: The 'starfish' graph G. The marked item is at one of the tip vertices. 



Let Vi* be the tip vertex of Lj* , and let Y be the input which is 1 at Vi* and elsewhere. Then let X q be 
a hybrid input, which is Xq during queries 1 to T — qcS, but Y during queries T — qcS + 1 to T . Also, let 

0) 



i,Z 

be the algorithm's state after t queries when run on X q , and let 

D(g,r) = ||^ (*,))- |v (T) (X r )) 



ViEG z 



Then for all g > f , we claim that D (g - 1, g) < 4r 



(T-gc«) 



For by unitarity, the Euclidean distance between 



\ip^ (X q -i)) and (X q )) can only increase as a result of queries T — qcS+1 through T— (q — l)c5. But 
no amplitude from outside Lj* can reach during that interval, since the distance is 5/2 and there are 
only cd < 6/2 time steps. Therefore, switching from X q _i to X q can only affect amplitude that is in Lj* 
immediately after query T — qcd : 



D{q-l,q)< ]T E|' 

ViGLj, z 

= 4 E E 



(T-qcS) 



(x q ) (- 



„(T- g a5) 



(*o) 



= AT (T ,~ qcS) 



It follows that 



9=1 



9=1 



2(n- 1) 



T 



5(n-l) 



Here the first inequality uses the triangle inequality, and the third uses the Cauchy-Schwarz inequality. Now 
assuming the algorithm is correct we need D (0, w) — fl (1), which implies that T = il (^VnS^j . ■ 

It is immediate that Theoremdapplies to Z-local unitaries as well as C-local ones: that is, Q z (OR, G) = 
(^/nSj . We believe the theorem can be extended to iJ-local unitaries as well, but a full discussion of this 



issue would take us too far afield. 
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5 Search on Grids 



Let Cd (n) be a d-dimensional grid graph of size n x l d x • ■ ■ x n 1 l d . That is, each vertex is specified by d 
coordinates i\,...,id £ {l, ■ ■ ■ ,n 1 / d }, and is connected to the at most 2c? vertices obtainable by adding or 
subtracting 1 from a single coordinate (boundary vertices have fewer than 2d neighbors). We write simply Cd 
when n is clear from context. In this section we present our main positive results: that Q (OR, Cd) = O {\fn) 
for d > 3, and Q (OR, Ci) = O (-^/npolylogn) for d — 2. 

Before proving these claims, let us develop some intuition by showing weaker bounds, taking the case 
d = 2 for illustration. Clearly Q (OR, £2) = 0(n 3 / 4 ): we simply partition £ 2 (n) into yfn subsquares, 
each a copy of £2 (y/n). In hy/n steps, the robot can travel from the start vertex to any subsquare C, 
search C classically for a marked vertex, and then return to the start vertex. Thus, by searching all 
y/n of the C"s in superposition and applying Grover's algorithm, the robot can search the grid in time 
O (n 1 / 4 ) x 5^ = O (n 3/4 ). 

Once we know that, we might as well partition £ 2 (n) into n 1 / 3 subsquares, each a copy of £2 (n 2 / 3 )- 
Searching any one of these subsquares by the previous algorithm takes time O ^(n 2 / 3 ) 3 ^ = 0(y/n), an 
amount of time that also suffices to travel to the subsquare and back from the start vertex. So using Grover's 
algorithm, the robot can search £ 2 i n ) in time O n 1 / 3 • y/rij = O (n 2 / 3 ). We can continue recursively in 

this manner to make the running time approach O (y/n). The trouble is that, with each additional layer 
of recursion, the robot needs to repeat the search more often to upper-bound the error probability. Using 
this approach, the best bounds we could obtain are roughly O (y^polylogn) for d > 3, or x /n2°^ l ° sn ^ for 
d = 2. In what follows, we use the amplitude amplification approach of Grover and Brassard et al. 
[TT] to improve these bounds, in the case of a single marked vertex, to O (y/n) for d > 3 (Section l5.2|l and 

O L/n\og 3/2 nj for d = 2 (Section ESJ. Section IOI generalizes these results to the case of multiple marked 
vertices. 

Intuitively, the reason the case d = 2 is special is that there, the diameter of the grid is 9 (\/n), which 
matches exactly the time needed for Grover search. For d > 3, by contrast, the robot can travel across the 
grid in much less time than is needed to search it. 

5.1 Amplitude Amplification 

We start by describing amplitude amplification [111 I19j . a generalization of Grover search. Let U be a 
quantum algorithm that, with probability e, outputs a correct answer together with a witness that proves 
the answer correct. (For example, in the case of search, the algorithm outputs a vertex label i such that 
Xi = 1.) Amplification generates a new algorithm that calls hi order l/y/e times, and that produces both a 
correct answer and a witness with probability Q (1). In particular, assume IA starts in basis state |s), and 
let m be a positive integer. Then the amplification procedure works as follows: 

(1) Set |Vo> =U\s). 

(2) For i = 1 to to set \ip i+1 ) = USU~ X W |^) , where 

• W flips the phase of basis state \y) if and only if \y) contains a description of a correct witness, 
and 

• S flips the phase of basis state \y) if and only if \y) = \s). 

We can decompose \tpo) as sina |* S ucc) + cosa l^faii), where l^succ) is a superposition over basis states 
containing a correct witness and |^f a ii) is a superposition over all other basis states. Brassard et al. [TT] 
showed the following: 

Lemma 8 ([TT]) = sin [(2i + 1) a] |* succ ) + cos [(2i + 1) a] |* faU ) . 
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If measuring \tpo) gives a correct witness with probability e, then |sina| = e and |a| > 1/V^- So taking 
m — 0{l/^/e) yields sin [(2m + l)a] ps 1. For our algorithms, though, the multiplicative constant under 
the big-0 also matters. To upper-bound this constant, we prove the following lemma. 

Lemma 9 Suppose a quantum algorithm 11 outputs a correct answer and witness with probability exactly e. 
Then by using 2m + 1 calls to U or U , where 

TT 1 

m < -= , 

4 arcsin ^/e 2 

we can output a correct answer and witness with probability at least 

Proof. Wc perform m steps of amplitude amplification, which requires 2m + 1 calls to U or 14 . By 
Lemma |SJ this yields the final state 

sin [(2m + 1) a] |* succ ) + cos [(2m + 1) a] |* faU ) . 

where a = arcsin ^/e. Therefore the success probability is 

sin 2 [(2m + 1) arcsin y/e\ > sin 2 [(2m + 1) s/e\ 

>( (2m+1) ^_fiiii±i>! e ^ 2 

o 

Here the first line uses the monotonicity of sin 2 x in the interval [0, 7r/2], and the second line uses the fact 
that sin a; > x — x 3 /6 for all x > by Taylor series expansion. ■ 

Note that there is no need to uncompute any garbage left by U 7 beyond the uncomputation that happens 
"automatically" within the amplification procedure. 



5.2 Dimension At Least 3 

Our goal is the following: 

Theorem 10 If d> 3, then Q (OR, C d ) = 6 (y/n). 

In this section, we prove Theorem 1101 for the special case of a unique marked vertex; then, in Sections 
15.41 and 15.51 we will generalize to multiple marked vertices. Let OR*-^ be the problem of deciding whether 
there are no marked vertices or exactly k of them, given that one of these is true. Then: 



Theorem 11 If d > 3, then Q (OR (1) ,£ d ) = 6 (Vn). 



Choose constants (3 € (2/3, 1) and /i G (1/3, 1/2) such that (3fx > 1/3 (for example, (3 — 4/5 and [i — 5/11 



Also 



will work). Let £q be a large positive integer; then for all positive integers R, let £r = in- 
let tir — £ R . Assume for simplicity that n = ur for some R; in other words, that the hypercube Cd {nR) to 
be searched has sides of length £r. Later we will remove this assumption. 

Consider the following recursive algorithm A. If n — no, then search Cd (no) classically, returning 1 if a 
marked vertex is found and otherwise. Otherwise partition Cd (nn) into nn/nn-i subcubes, each one a 
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copy of Cd (ur-i). Take the algorithm that consists of picking a subcube C uniformly at random, and then 
running A recursively on C. Amplify this algorithm (jir/ur-i)^ times. 

The intuition behind the exponents is that tir-i ~ n R , so searching Cd (tir-i) should take about tIr 2 
steps, which dominates the n]( d steps needed to travel across the hypercube when d > 3. Also, at level R 

1 /2 

we want to amplify a number of times that is less than (tir/tir^i) by some polynomial amount, since full 
amplification would be inefficient. The reason for the constraint j3fi > 1/3 will appear in the analysis. 

We now provide a more explicit description of A, which shows that it can be implemented using C-local 
unitaries and only a single bit of workspace. At any time, the quantum robot's state will have the form 
Y^i z ai , z \ Vi ' z )' wnere v i is a vertex of Cd (tir) and z is a single bit that records whether or not a marked 
vertex has been found. Given a subcube C, let v (C) be the "corner" vertex of C; that is, the vertex that is 
minimal in all d coordinates. Then the initial state when searching C will be \v (C) ,0). Beware, however, 
that "initial state" in this context just means the state \s) from Section fSTTI Because of the way amplitude 
amplification works, A will often be invoked on C with other initial states, and even run in reverse. 

For convenience, we will implement A using a two-stage recursion: given any subcube, the task of A 
will be to amplify the result of another procedure called U, which in turn runs A recursively on smaller 
subcubes. We will also use the conditional phase flips W and S from Section Iq"T1 For convenience, we 
write Ar,Ur, Wr, Sr to denote the level of recursion that is currently active. Thus, Ar calls Ur, which 
calls Ar~ i , which calls Ur-i, and so on down to A® . 

Algorithm 12 (>4.r) Searches a subcube C of size tir for the marked vertex, and amplifies the result to 
have larger probability. Default initial state: \v(C),0). 
If R = then: 

(1) Use classical C-local operations to visit all n vertices of C in any order. At each V{ € C , use a query 
transformation to map the state \vi,z) to \ 

(2) Return tov(C). 
If R > 1 then: 

(1) Let niR be the smallest integer such that 2m,R + 1 > (nR/nR-i) . 



Suppose Ar is run on the initial state \v(C),0), and let C±, . . . , C nR / no be the minimal subcubes in 
C — meaning those of size hq. Then the final state after Ar terminates should be 



if C does not contain the marked vertex. Otherwise the final state should have non-negligible overlap with 
\v (Ci* ) , 1), where Ci* is the minimal subcube in C that contains the marked vertex. In particular, if R = 0, 
then the final state should be \v (C) ,1) if C contains the marked vertex, and \v (C) , 0) otherwise. 

The two phase-flip subroutines, Wr and Sr, are both trivial to implement. To apply Wr, map each 
basis state \vi, z) to (— l) z \v{, z). To apply Sr, map each \vi, z) to — \vi, z) if z = and V{ — v (C) for some 
subcube C of size ur, and to \vi, z) otherwise. Below we give pseudocode for Ur. 

Algorithm 13 (Ur) Searches a subcube C of size ur for the marked vertex. Default initial state: \v (C) , 0). 
(1) Partition C into ur/ur^i smaller subcubes C\, . . . , C nR / nR _ 1 , each of size tir-i- 



(2) CallU R . 

(3) For i — 1 to tur, call Wr, then U^ 1 , then Sr, then Ur. 




na/no 
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(2) For all j G {1, . . . , d} 7 let Vj be the set of corner vertices v (Ci) that differ from v (C) only in the first 
j coordinates. Thus Vo = {v (C)}, and in general \Vj\ — (£r/ 'Ir-iY . For j = 1 to d, let \Vj) be the 
state 

\ v i) = jr* E \ v ^^ 

v{d)eVj 

Apply a sequence of transformations Z\, Z2, ■ ■ ., Zd where Zj is a unitary that maps \Vj-\) to \Vj) by 
applying C-local unitaries that move amplitude only along the j th coordinate. 

(3) Call Ar-i recursively. (Note that this searches C\, ... ,C nR / nR _ 1 in superposition. Also, the required 
amplification is performed for each of these subcubes automatically by step (3) of Ar-i.) 

HUr is run on the initial state \v (C) , 0), then the final state should be 

nn/no 

\/n R /n R -i 

where \4n) is the correct final state when A R -\ is run on subcube Ci with initial state \v (Ci) ,0). A key 
point is that there is no need for Ur to call A R -\ twice, once to compute and once to uncompute — for the 
uncomputation is already built into Ar. This is what will enable us to prove an upper bound of O (y/n) 
instead of O (y / n2' R ) = O (^/npolylogn). 
We now analyze the running time of Ar. 

Lemma 14 ^4^ uses O (n R ) steps. 

Proof. Let T4 (R) and Tu (R) be the total numbers of steps used by Ar and Ur respectively in searching 
C d {n R ). Then we have T A (0) = 0(1), and 

T A (R) < (2m R + 1) T u (R) + 2m R 
T u (R) < dn R /d +T A (R- 1) 

for all R > 1. For Wr, and Sr, can both be implemented in a single step, while Ur uses dlR, = dn R d steps 
to move the robot across the hypercube. Combining, 

T A (R) < (2m R + 1) [dn R ,d + T A (R — 1)) + 2m R 

< ((nn/riR-tf + 2) (dn R /d + T A (R — 1)) + (uR/nR^f + 1 

= O ((nfl/nfl-i)" n R /d ) + {{n R /n R - 1 ) li + 2) T A (R - 1) 

= O ((nfl/nfl-i)" n R /d ) + (tir/tir^T T a (R - 1) 

= O {(n R /nR-iY n R ,d + (n R /n R - 2 Y ' nj,^ H h (n R /n Q Y n{ /d ^J 
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Here the second line follows because 2itir + 1 < (n R /n R -\Y + 2, the fourth because the (nR/nR^x)^ 
terms increase doubly exponentially, so adding 2 to each will not affect the asymptotics; the seventh because 

nf = Q, ({ n i+i)j> the eighth because ur_i < n R ; and the last because /3/z > 1/3 > 1/d, hence n\^ d ~ l3>i < 1. 
■ 

Next we need to lower-bound the success probability. Say that Ar or Ur "succeeds" if a measurement in 
the standard basis yields the result \v (Cj*) , 1), where Ci* is the minimal subcube that contains the marked 
vertex. Of course, the marked vertex itself can then be found in Uq = O (1) steps. 



— 2/u 
R 



Lemma 15 Assuming there is a unique marked vertex, Ar, succeeds with probability Q (l/n^ 

Proof. Let P A (R) and Py (R) be the success probabilities of Ar and Ur respectively when searching 
C d (ur). Then clearly P A (0) = 1, and P u (R) = (n R -i/n R ) P A (R - 1) for all R > 1. So by LemmaEl 



Pa (R) > ( 1 - 3 (2m R + lY P u (R)j (2m R + 1)' P u (R) 

I (2m R + l) 2 ^iP4 (R - 1)) (2m R + l) 2 ^^P A (R - 1) 
3 n R J n R 

[\-\ (nB/nfl-i) 2 " *^Pa (R - 1)) (n R /n R ^ ^^P A (R - 1) 
3 n R J n R 



> ( 1 - 3 (n R -i/n R y-^ ) (nR^/nRy-^ P A (R— 1) 
1~3 {n R -i/n R ) ~ " 




3^-/33(1-2^) 



Here the third line follows because 2mij + 1 > (n#_i/n,R) M and the function nondecreasing in the 

interval [0, 1]; the fourth because P A (R — 1) < 1; the sixth because ur-\ < n R ; and the last because f3 < 1 
and /i < 1/2, the n^'s increase doubly exponentially, and no is sufficiently large. ■ 

Finally, take Ar itself and amplify it to success probability Q (1) by running it 0(n X R 2 M ) times. 
This yields an algorithm for searching C d (nR,) with overall running time O (n]( 2 ^j , which implies that 

Q (OR^\ C d (n R )j =o(n R /2 y 

All that remains is to handle values of n that do not equal ur for any R. The solution is simple: first 
find the largest R such that ur < n. Then set n! = ur [n 1 ^ )Ir\ , and embed L d (n) into the larger 
hypercube £ d (ri). Clearly Q (0R (1) , L d (nj) < Q (0R (1) , C d (n')) • Also notice that ri' = O (n) and that 

n' — O (tIr^ = O (n 3 R 2 ^j ■ Next partition C d [n 1 ) into u'/ur subcubes, each a copy of C d (nR). The 
algorithm will now have one additional level of recursion, which chooses a subcube of C d (n 1 ) uniformly at 
random, runs Ar, on that subcube, and then amplifies the resulting procedure n' /nR^j times. The 
total time is now 

°G/i (<"'» i,j +»» /2 ))= o (^j= o <^ 

while the success probability is fl (1). This completes Theorem II II 
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5.3 Dimension 2 

In the d — 2 case, the best we can achieve is the following: 
Theorem 16 Q (OR, C 2 ) = O (v^log 5/2 nj . 

Again, we start with the single marked vertex case and postpone the general case to Sections 15 . 41 and [5 . 51 
Theorem 17 Q (OR (1) ,£ 2 ) = O (^log 372 n) . 

For d > 3, we performed amplification on large (greater than O (l/n 1_2/i )) probabilities only once, at the 
end. For d = 2, on the other hand, any algorithm that we construct with any nonzero success probability 
will have running time Q (\/n), simply because that is the diameter of the grid. If we want to keep the 
running time O (-\/n), then we can only perform O (1) amplification steps at the end. Therefore we need to 
keep the success probability relatively high throughout the recursion, meaning that we suffer an increase in 
the running time, since amplification to high probabilities is less efficient. 

The procedures Ar, Ur, Wr, and Sr are identical to those in Section f5.2l all that changes are the 
parameter settings. For all integers R > 0, we now let ur = £ 2R , for some odd integer £ > 3 to be set 
later. Thus, Ar and Ur search the square grid C 2 ( n R) of size £ R x £ R . Also, let m = (£q — 1) /2; then Ar 
applies m steps of amplitude amplification to Ur. 

We now prove the counterparts of Lemmas 1141 and 1151 for the two-dimensional case. 

Lemma 18 Ar uses O (R£ R+1 ) steps. 

Proof. Let T4 (R) and Tu (R) be the time used by Ar and Ur respectively in searching £ 2 {nR). Then 
T A (0) = 1, and for all R > 1, 

T A (R) < (2m + 1) T u (R) + 2m, 
Tu (R) < 2n R /2 +T A {R-l). 

Combining, 

T A (R) < (2m + 1) (2nJ 2 + T A (R — 1)) + 2m 
= £ (2£ R + T A (R-l))+e -l 
= 0{£ R+1 + £ T A (R-l)) 
= O (R£ R+1 ) . 



Lemma 19 .4^ succeeds with probability 0,(1/ R). 

Proof. Let P A (R) and Py (R) be the success probabilities of Ar and Ur respectively when searching 
£2 (n R ). Then P u (R) =P A (R - 1) j £\ for all R > 1. So by LemmaEO and using the fact that 2m+ 1 = £ , 

Pa (R) >(l~ {2m ^ 1)2 Pu (R)j (2m + l) 2 P u (R) 

_( £ 2 P A (R-1) \ fl2 P A (R-l) 
"V 3 £l r° £t 

= P A (R-l)~^P 2 A (R-l) 

= n(i/R). 



15 



This is because ft (R) iterations of the map xr :— xr-\ — are needed to drop from (say) 2/R to l/R, 

and xo = Pa (0) = 1 is greater than 2/R. m 

We can amplify Ar to success probability Q (1) by repeating it O (v^R) times. This yields an algo- 
rithm for searching £2 ( n n) that uses O (i? 3 / 2 £^ +1 ) = O (y/n~R~R 3 / 2 £o) steps in total. We can minimize 
this expression subject to £q R = ur by taking £0 to be constant and R to be (log tir), which yields 
Q ^OR*- 1 - 1 , £2 { n R)j = (y/fiR log n 3 J 2 ^j . If n is not of the form £ 2R , then we simply find the smallest 

integer R such that n < £q R , and embed £2 (n) in the larger grid £2 (^o^j- Since £q is a constant, this 
increases the running time by at most a constant factor. We have now proved Theoremll7l 

5.4 Multiple Marked Items 

What about the case in which there are multiple i's with x^ = 1? If there are k marked items (where 
k need not be known in advance), then Grover's algorithm can find a marked item with high probability 

in (y/n/k\ queries, as shown by Boyer et al. JHj- I n our setting, however, this is too much to hope 
for — since even if there are many marked vertices, they might all be in a faraway part of the hypercube. 
Then ft (n 1 ^) steps are needed, even if \fnjk < n 1 / d . Indeed, we can show a stronger lower bound. Recall 
that OR^ is the problem of deciding whether there are no marked vertices or exactly k of them. 

Theorem 20 For all dimensions d > 2, 

Q(ORW,£ d )=o(^g_ 

Here, for simplicity, we ignore constant factors depending on d. 

Proof. For simplicity, we assume that both k x l d and (n/3 d fc) 1/,d are integers. (In the general case, we 

can just replace k by [fc 1 ^] d and n by the largest integer of the form (3m) d k which is less than n. This 
only changes the lower bound by a constant factor depending on d.) 

We use a hybrid argument almost identical to that of Theorem [7| Divide Cd into n/k subcubes, each 
having k vertices and side length k x l d . Let S be a regularly-spaced set of M = n/ (3 d k) of these subcubes, 
so that any two subcubes in S have distance at least 2k x / d from one another. Then choose a subcube Cj G S 
uniformly at random and mark all k vertices in Cj. This enables us to consider each Cj £ S itself as a 
single vertex (out of M in total), having distance at least 2k 1 / d to every other vertex. 

More formally, given a subcube Cj E S, let Cj be the set of vertices consisting of Cj and the 3 d — 1 
subcubes surrounding it. (Thus, Cj is a subcube of side length Sfc 1 ^.) Then the query magnitude of Cj 
after the t th query is 

If = E ZRW: 

ViGCi 



where Xq is the all-zero input. Let T be the number of queries, and let w = T / (c/c 1 ^) for some constant 
c > 0. Then as in Theorem there must exist a subcube Cj* such that 



w — 1 



T {T^qck 1 ' d ) < w 3 d kw 
j * ~ M ~ n 

q=0 

Let Y be the input which is 1 in Cj* and elsewhere; then let X q be a hybrid input which is Xo during 
queries 1 to T - qck 1/d , but Y during queries T - qck 1/d + 1 to T. Next let 

D(q,r)= E El^W-^W " 
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(T-qck 1/d ) 

Then as in Theorem for all c < 1 we have D (q — l,q) < 4TL . For in the ck 1 ' d queries from 

T — qck x l d + 1 through T — (q — 1) ck 1 ^, no amplitude originating outside Cj* can travel a distance k 1 ^ 
and thereby reach Cj*. Therefore switching from X q -± to X q can only affect amplitude that is in Cj* 
immediately after query T — qck x l d . It follows that 



r— — ^ r— ^ / (r-,cfcv-) /3¥ 2V3^fc 1 /2-i/d T 

v^<rVoFy<2VJr, J <2un/ = -= . 

^— ' , V J V n cv« 

Hence T = Q, (^/n/k 1/2 - 1/d ) for constant d, since assuming the algorithm is correct we need D (0,w) = O (1). 
■ 

Notice that if k w n, then the bound of Theorem I2UI becomes Q (n 1 ^) which is just the diameter of Cd- 
Also, if d = 2, then 1/2 — l/d= and the bound is simply (\/n) independent of fc. The bound of Theorem 
[50] can be achieved (up to a constant factor that depends on d) for d > 3, and nearly achieved for d = 2. 
We first construct an algorithm for the case when k is known. 

Theorem 21 

(%) For d>3, 

Q(0R (fc \£ d ) =0 

(ii) For d = 2, 



jfcl/2-l/d 



g(oR (fc) ,£ 2 ) =o(v^iog 



3/2 



To prove Theorem 1211 we first divide Cd (n) into n/7 subcubes, each of size 7 1 / d x • • • x 7^ (where 7 
will be fixed later). Then in each subcube, we choose one vertex uniformly at random. 

Lemma 22 If 7 > k, then the probability that exactly one marked vertex is chosen is at least k/j— (k/-f) 2 . 

Proof. Let i be a marked vertex. The probability that x is chosen is 1/7. Given that x is chosen, the 
probability that one of the other marked vertices, y, is chosen is if x and y belong to the same subcube, or 
I/7 if they belong to different subcubes. Therefore, the probability that x alone is chosen is at least 









H) 







7 V 7 

Since the events ll x alone is chosen" are mutually disjoint, we conclude that the probability that exactly one 
marked vertex is chosen is at least fc/7 — (fc/7) 2 . ■ 

In particular, fix 7 so that 7/3 < k < 27/3; then Lemma 12*2*1 implies that the probability of choosing 
exactly one marked vertex is at least 2/9. The algorithm is now as follows. As in the lemma, subdivide 
Cd (n) into n/7 subcubes and choose one location at random from each. Then run the algorithm for the 
unique-solution case (Theorem II II or |17 (I on the chosen locations only, as if they were vertices of Cd (n/"/). 

The running time in the unique case was O (\Jn/^ for d > 3 or 



0^1og 3 / 2 (n/7))=o(^log 3 / 2 n 



for d = 2. However, each local unitary in the original algorithm now becomes a unitary affecting two vertices 
v and w in neighboring subcubes C v and C w . When placed side by side, C v and C w form a rectangular box 
of size 27 1 / <i x 7 1 / d x • • • x 7 1 / d . Therefore the distance between v and w is at most (d + 1) r ) 1 / d . It follows 
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that each local unitary in the original algorithm takes O (d^ 1 ^) time in the new algorithm. For d > 3, this 
results in an overall running time of 



For d = 2 we obtain 



7 7 V 7 1 / 2 ~ 1 / d J V fc 1/2 ~ 1/d 
0^ 7 1/2 log 3/2 n) = 0(V^log 3/2 



5.5 Unknown Number of Marked Items 

We now show how to deal with an unknown k. Let OR/-' -' be the problem of deciding whether there are 
no marked vertices or at least k of them, given that one of these is true. 

Theorem 23 

(%) For d>3, 

Q(p^\L d )=o{ 1 ^ m 

(ii) For d = 2, 



Q(oR^ fc >,£ 2 ) =o(^log 5/2 n 

Proof. We use the straightforward 'doubling' approach of Boyer et al. [TU| : 

(1) For j = to log 2 (n/k) 

• Run the algorithm of Theorem 1211 with subcubes of size 7j = 2^k. 

• If a marked vertex is found, then output 1 and halt. 

(2) Query a random vertex v, and output 1 if v is a marked vertex and otherwise. 

Let k* > k be the number of marked vertices. If k* < n/3, then there exists a j < log 2 (n/k) such 
that 7j/3 < k* < 2jj/3. So Lemma 12*2*1 implies that the j th iteration of step (1) finds a marked vertex 
with probability at least 2/9. On the other hand, if k* > n/3, then step (2) finds a marked vertex with 
probability at least 1/3. For d > 3, the time used in step (1) is at most 



^og 2 (n/k) 

E 



l/2-l/d fcl/2-l/d 

J=0 Ij 



log 2 (n/k) 

E 



23(l/2-l/d!) 
3=0 



= 



the sum in brackets being a decreasing geometric series. For d = 2, the time is O ^-^/nlog 5 / 2 nj , since each 
iteration takes O (^/nlog 3 ^ 2 nj time and there are at most logn iterations. In neither case does step (2) 

affect the bound, since k < n implies that n x / d < y/n/k 1 ^ 2-1 ^. ■ 

Taking k = 1 gives algorithms for unconstrained OR with running times 0(y/n) for d > 3 and 0(y/n log 5 / 2 n) 
for d = 2, thereby establishing Theorems UHl and Hfil 
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6 Search on Irregular Graphs 



In SectionOl we claimed that our divide-and-conquer approach has the advantage of being robust: it works 
not only for highly symmetric graphs such as hypercubes, but for any graphs having comparable expansion 
properties. Let us now substantiate this claim. 

Say a family of connected graphs {G n = (V n , E n )} is d- dimensional if there exists a k > such that for 
all n, £ and v € V n , 

\B(v,£)\ > mm(n£ d ,n) , 

where B (v,£) is the set of vertices having distance at most £ from v in G n . Intuitively, G n is d-dimcnsional 
(for d > 2 an integer) if its expansion properties are at least as good as those of the hypercube Cd {n). 5 It is 
immediate that the diameter of G n is at most (ti/k) 1 . Note, though, that G n might not be an expander 
graph in the usual sense, since we have not required that every sufficiently small set of vertices has many 
neighbors. 

Our goal is to show the following. 

Theorem 24 If G is d- dimensional, then 

(i) For a constant d > 2, 

Q (OR, G)=0 (Vnpolylogn) . 

(it) For d = 2, 

Q(OR,G) = V^2°( v ^). 

In proving part (i), the intuition is simple: we want to decompose G recursively into subgraphs (called 
clusters), which will serve the same role as subcubes did in the hypercube case. The procedure is as follows. 
For some constant n-y > 1, first choose \n/n-y\ vertices uniformly at random to be designated as 1-pegs. 
Then form 1-clusters by assigning each vertex in G to its closest 1-peg, as in a Voronoi diagram. (Ties are 
broken randomly.) Let v (C) be the peg of cluster C. Next, split up any 1-cluster C with more than n\ 
vertices into |~|C|/ni] arbitrarily-chosen 1-clusters, each with size at most n\ and with v (C) as its 1-peg. 
Observe that 



E 



par 


< 2 


n 




m 




Til 



where n = \C\\ + • • •+ C7r n / ni ] |. Therefore, the splitting-up step can at most double the number of clusters. 

In the next iteration, set ri2 = n\ , for some constant f3 S (2/d, 1). Choose 2 [n/r^l vertices uniformly 
at random as 2-pegs. Then form 2-clusters by assigning each 1-cluster C to the 2-peg that is closest to the 
1-peg v (C). Given a 2-cluster C", let \C'\ be the number of 1-clusters in C' . Then as before, split up any 
C' with \C'\ > n^jnx into \\C'\ / (r^/rii)] arbitrarily-chosen 2-clusters, each with size at most nijn\ and 
with v (C) as its 2-peg. Continue recursively in this manner, setting ur = and choosing 2 R ~ 1 [n/nij] 

vertices as i?-pegs for each R. Stop at the maximum R such that < n. For technical convenience, set 
n = 1, and consider each vertex v to be the 0-peg of the 0-cluster {v}. 

For R > 1, define the radius of an i?-cluster C to be the maximum, over all (R— l)-clusters C' in 
C, of the distance from v (C) to v(C'). Also, call an i?-cluster good if it has radius at most £r, where 

£ R =(%n R lnnf d . 

Lemma 25 With probability 1 — o (1) over the choice of clusters, all clusters are good. 

5 In general, it makes sense to consider non-integer d as well. 
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Proof. Let v be the (R - l)-peg of an (R - l)-cluster. Then \B (v, t)\ > n£ d , where B (v, £) is the ball 
of radius £ about v. So the probability that v has distance greater than £r to the nearest i?-peg is at most 

! _ [n/ " R] < ( 1 _ 21nn\ n/Wfl 1 



n J \ n/nji J n 2 

Furthermore, the total number of pegs is easily seen to be O (n). It follows by the union bound that every 
(R — l)-peg for every R has distance at most £r to the nearest R-peg, with probability 1 — (1/n) = 1 — o (1) 
over the choice of clusters. ■ 

At the end we have a tree of clusters, which can be searched recursively just as in the hypercube case. 
Lemma [55] gives us a guarantee on the time needed to move a level down (from a peg of an i?-cluster to a peg 
of an R — 1-cluster contained in it) or a level up. Also, let K' (C) be the number of (R — l)-clusters in R- 
cluster C; then K' (C) < K (R) where K (R) = 2 \n R /n R -{\ . If K' (C) < K (R), then place K (R) - K' (C) 
"dummy" (R — l)-clusters in C, each of which has (R — l)-peg v (C). Now, every i?-cluster contains an equal 
number of R — 1 clusters. 

Our algorithm is similar to Section f5.2l but the basis states now have the form \v,z,C), where v is a 
vertex, z is an answer bit, and C is the label of the cluster currently being searched. (Unfortunately, 
because multiple i?-clusters can have the same peg, a single auxiliary qubit no longer suffices.) 

The algorithm Ar from Section l5~2"l now does the following, when invoked on the initial state \v (C) , 0, C), 
where C is an i?-clustcr. If R = 0, then Ar uses a query transformation to prepare the state \v (C) , 1, C) 
if v (C) is the marked vertex and \v (C) , 0, C) otherwise. If R > 1 and C is not a dummy cluster, then Ar 
performs vhr steps of amplitude amplification on Ur, where vtlr is the largest integer such that 2mjj + 1 < 
\J riRjuR-x^ If C is a dummy cluster, then Ar does nothing for an appropriate number of steps, and then 
returns that no marked item was found. 

We now describe the subroutine Ur, for R > 1. When invoked with \v (C) , 0, C) as its initial state, Ur 
first prepares a uniform superposition 

1 



K(R) 

J2 \v(Ci),0,Ci, 



It does this by first constructing a spanning tree T for C, rooted at v (C) and having minimal depth, and 
then moving amplitude along the edges of T so as to prepare \4>c)- After \<j>c) has been prepared, Ur then 
calls Ar-i recursively, to search C\, . . . , Ck(r) in superposition and amplify the results. Note that, because 
of the cluster labels, there is no reason why amplitude being routed through C should not pass through some 
other cluster C along the way — but there is also no advantage in our analysis for allowing this. 
We now analyze the running time and success probability of Ar. 

Lemma 26 Ar uses O (^/ur log 1 ^ rij steps, assuming that all clusters are good. 

Proof. Let T4 (R) and Ty (R) be the time used by Ar and Ur respectively in searching an i?-cluster. 
Then we have 

T A (R) < \IurIur-xTu (i?) , 
T U (R) < £r + T A (R- 1) 



6 In the hypercube case, we performed fewer amplifications in order to lower the running time from y'npolylogn to y/n. 
Here, though, the splitting-up step produces a polylog n factor anyway. 



20 



with the base case T A (0) = 1. Combining, 

T A (R) < vW«*-i (*R + T A (R- 1)) 



< ^/n R /n R -xl R + \fn R /n R ^ 2 i-R-i H h V n R /n £i 

= ^frTa ■ O 



(n^lnn) 1 ^ (ni lnn) 1 ^ 



/d-P/2 _| h 



V/3 / 1/d _^ /2N d//3) R -' 



Of^log^n), 



where the last line holds because (3 > 2/d and therefore n\^ d 13 ^ 2 < 1. ■ 

Lemma 27 .4/? succeeds with probability (1/ polylognft) m searching a graph of size n — n R , assuming 
there is a unique marked vertex. 

Proof. For all R > 0, let C R be the P-cluster that contains the marked vertex, and let P A (P) and 
P^ (P) be the success probabilities of A R and U R respectively when searching C R . Then for all P > 1, we 
have P u (P) = P A (P - 1) /K (P), and therefore 



(2mA, + T 2 



Pa (P) > I 1 - V """Y - Pa (P)J (2m fl , + l) 2 P w (P) 

/ (2mg + 1) 2 P A (R-\) \ ^ 2 P A (R-1) 
= { 1 3 K(Rr) { R+ ] K(R) 

= n(p A (R-i)) 

= n (l/polylog n R ) . 

Here the third line holds because (2m R + l) 2 ss n R jn R ^\ ss K (R) /2, and the last line because P = 
9(loglogn fl ). ■ 

Finally, we repeat A R itself 0(polylognfl;) times, to achieve success probability fi (1) using O (^n R polylogn^) 
steps in total. Again, if n is not equal to n R for any P, then we simply find the largest P such that 
n R < n, and then add one more level of recursion that searches a random P-cluster and amplifies the result 
© [y/n/niij times. The resulting algorithm uses O (y^polylogn) steps, thereby establishing part (i) of 
Theorem for the case of a unique marked vertex. The generalization to multiple marked vertices is 
straightforward. 

Corollary 28 // G is d-dimensional for a constant d > 2, then 



Proof. Assume without loss of generality that k — o (n), since otherwise a marked item is trivially found 
in O (n 1 /^) steps. As in Theorem 1231 we give an algorithm B consisting of log 2 (n/k) + 1 iterations. In 
iteration j = 0, choose |~?Vfc] vertices wi, . ■ ■ , w\n/k~] uniformly at random. Then run the algorithm for the 
unique marked vertex case, but instead of taking all vertices in G as 0-pegs, take only mi, ... , ■ On 

the other hand, still choose the 1-pegs, 2-pegs, and so on uniformly at random from among all vertices in G. 
For all P, the number of P-pegs should be \(n/k) /n R ~\. In general, in iteration j of B, choose \nj (2-'fc)] 



21 



vertices w±, . . . , W\ n /{"2iky\ uniformly at random, and then run the algorithm for a unique marked vertex as 
if w\, . . . , tur n /(2ife)l were the only vertices in the graph. 

It is easy to see that, assuming there are k or more marked vertices, with probability f2 (1) there exists an 
iteration j such that exactly one of Wi, . . . , W\ n /{2^ky\ l& marked. Hence B succeeds with probability fl (1). 
It remains only to upper-bound i3's running time. 

In iteration j , notice that Lemma 1251 goes through if we use £^ :— {^.^kuR In r) instead of £r. That 
is, with probability 1 — (k/n) = 1 — o (1) over the choice of clusters, every ii-cluster has radius at most 
eft . So letting T A (i?) be the running time of Ar on an i?-cluster, the recurrence in Lemma |2H becomes 

T A (R) < Vmi/nR-i (Zr + T A (R- 1)) = O (v^i (2 J fc log (n/k)) 1/d 

which is 



o 



1/2-1/d 



if tir = (n/ (2 J 'fc)). As usual, the case where there is no R such that ur — O (n/ (2 J fc)) is trivially 
handled by adding one more level of recursion. If we factor in the O (1/ polylognfl) repetitions of Ar 
needed to boost the success probability to f2 (1), then the total running time of iteration j is 



O 



( y/npolylogf 
^ {Vk) 1/2 - 1/d 



Therefore B's running time is 



; -v/n polylog n \ / y^polylogn 



n \p ynporyiog/t 

{ h (2^) i/2 - i/d y l v fc i/2 - i/d 
■ 

For the d = 2 case, the best upper bound we can show is ^/n2°^ logn ^ . This is obtained by simply 
modifying Ar to have a deeper recursion tree. Instead of taking ur = ui^j for some /i, we take ur = 
2v'iogn ni? _ 1 _ so fti&t the total number of levels is |Vl°g n\ . Lemma 1231 goes through without 

modification, while the recurrence for the running time becomes 



T A (R) < y/n R /nR-i {Ir + T a (R- 1)) 

< \fnR~fnR~ilR + \Zn R /nR^ 2 tR-i H h yW/Wl 

= O (2^^^Vh^+ ■■■ + 2^^^Vh^\ 
= V^2°(^^). 

Also, since the success probability decreases by at most a constant factor at each level, we have that 
P A {R) — 2~°( v ' losn ) ; and hence 2°( v ' losra ) amplification steps suffice to boost the success probability to 
Q (1). Handling multiple marked items adds an additional factor of logn, which is absorbed into 2°( v/logra ). 
This completes Theorem 1241 



6.1 Bits Scattered on a Graph 

In Section [21 we discussed several ways to pack a given amount of entropy into a spatial region of given 
dimensions. However, we said nothing about how the entropy is distributed within the region. It might 
be uniform, or concentrated on the boundary, or distributed in some other way. So we need to answer the 
following: suppose that in some graph, h out of the n vertices might be marked, and we know which h those 
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are. Then how much time is needed to determine whether any of the h is marked? If the graph is the 
hypercube Cd for d > 2 or is d-dimensional for d > 2, then the results of the previous sections imply that 
O (y/n polylog n) steps suffice. However, we wish to use fewer steps, taking advantage of the fact that h 
might be much smaller than n. Formally, suppose we are given a graph G with n vertices, of which h are 
potentially marked. Let OR ( - ?l '- fe - > be the problem of deciding whether G has no marked vertices or at least 
k of them, given that one of these is the case. 

Proposition 29 For all integer constants d>2, there exists a d- dimensional graph G such that 

1/2— l/d\ 



Proof. Let G be the e?-dimensional hypercube £d(n). Create h/k subcubes of potentially marked 
vertices, each having k vertices and side length k x l d . Space these subcubes out in Cd (n) so that the distance 

between any pair of them is fl \ (nk j 'hf ' ^ . Then choose a subcube C uniformly at random and mark all k 

vertices in C. This enables us to consider each subcube as a single vertex, having distance J! (^(nk j 'h) 1 ' >d J 

to every other vertex. The lower bound now follows by a hybrid argument essentially identical to that of 
Theorem ■ 

In particular, if d — 2 then O (v^) time is always needed, since the potentially marked vertices might all 
be far from the start vertex. The lower bound of Proposition [35] can be achieved up to a polylogarithmic 
factor. 

Proposition 30 If G is d- dimensional for a constant d > 2, then 

Q (0R<^ fc \G) = O (nV« ^y^^poiylog^ . 

Proof. Assume without loss of generality that k = o (h), since otherwise a marked item is trivially found. 
Use algorithm B from Corollary with the following simple change. In iteration j, choose \h/ (2 3 fc)] 
potentially marked vertices w%, . . . , u^r/t/(23fe)i uniformly at random, and then run the algorithm for a unique 
marked vertex as if Wi, . . . , w\h/(2ik)~\ were the only vertices in the graph. That is, take w\,..., w\h/{2iky\ 
as 0-pegs; then for all R > 1, choose \h/ (2- , fen^)] vertices of G uniformly at random as i?-pegs. Lemma 

EEl goes through if we use £^ :— (-?2 J fcni? In ¥) instead of £r. So following Corollary 1281 the running 



time of iteration j is now 

o (Ml 2 '") '"^ i) - ("" d (A) I/2 " /i s) 

if riR = (h/ (2 J fc)). Therefore the total running time is 

/io g2 (Vfc) , , vX/2-i/d ,\ / //A i/a-i/d h ' 



Intuitively, Proposition l^Ol savs that the worst case for search occurs when the h potential marked vertices 
are scattered evenly throughout the graph. 
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Figure 3: Alice and Bob 'synchronize' locations on their respective cubes. 



7 Application to Disjointness 

In this section we show how our results can be used to strengthen a seemingly unrelated result in quantum 
computing. Suppose Alice has a string X = x\ . . .x n € {0, 1}™, and Bob has a string Y = y\ . . .y n € {0, 1}". 
In the disjointness problem, Alice and Bob must decide with high probability whether there exists an i such 
that Xi = yi = 1, using as few bits of communication as possible. Buhrman, Cleve, and Wigderson |12| 
observed that in the quantum setting, Alice and Bob can solve this problem using only O {^/nlogn) qubits 
of communication. This was subsequently improved by H0yer and de Wolf [201 to O {^/nc l ° s ™) , where c is 
a constant and log* n is the iterated logarithm function. Using the search algorithm of Theorem llUI we can 
improve this to O (y/n), which matches the celebrated f2 (y/n) lower bound of Razborov |23) . 

Theorem 31 The quantum communication complexity of the disjointness problem is 0(y/n). 

Proof. The protocol is as follows. Alice and Bob both store their inputs in a 3-D cube £3 (n) (Figure 
I2J); that is, they let Xjki — Xi and yjki — Hi, where i — n 2 ^ 3 j + n^l^k + 1 + 1 and j, ft, I 6 {0, . . . , n 1 / 3 — l}. 
To decide whether there exists a (j, k, I) with Xjki = yjki — 1; Alice simply runs our search algorithm for an 
unknown number of marked items. If the search algorithm is in the state 



where Alice holds the first \vjki) and \z), Bob holds the second \vjki), and |c) is the communication channel. 
Thus, whenever Alice is at location (j, k, I) of her cube, Bob is at location (j, k, I) of his cube. 

(1) To simulate a query, Alice sends \z) and an auxiliary qubit holding Xjki to Bob. Bob performs 
\z) — > \z © yjki), conditional on Xjki = 1. He then returns both bits to Alice, and finally Alice returns 
the auxiliary qubit to the |0) state by exclusive-OR'ing it with Xjki- 

(2) To simulate a non-query transformation that does not change \vjki), Alice just performs it herself. 

(3) By examining Algorithms 1 121 and 1 131 we see that there are two transformations that change \vjki)- We 
deal with them separately. 

First, step 1 of Algorithm 1121 uses a classical C-local transformation \vj^,i) — * \vj',k',v)- This trans- 
formation can be simulated by Alice and Bob each separately applying \vj^,i) — * \vj> ,k' ,v) ■ 

Second, step 2 of Algorithm 1131 applies transformations Z\, Z2, and Z%. For brevity, we restrict our- 
selves to discussing Z\. This transformation maps an initial state |wj & j, 0) to a uniform superposition 
over \vj> : k,i, 0) for all (j', k, I) lying in the same Cj as (j, k, I). We can decompose this into a sequence 
of transformations mapping \vj> t k,l) to oc\vji ; ^ ; i) +/9|uj'+i,k,j) for some a, (3. This can be implemented 




then the joint state of Alice and Bob will be 




(1) 
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in three steps, using an auxiliary qubit. The auxiliary qubit is initialized to |0) and is initially held 
by Alice. At the end, the auxiliary qubit is returned to |0). The sequence of transformations is 

\vj',k,l) |0) \vf,k,l) -> a\vj',k,l) |0) \vf, k ,l) +P\vj>,k,i) |1) \vj',k,l) 

-> Ot\Vjr t k,l) |0) |Wj',fc,i) +/3|Wj',fc,i) |1) \Vj>+l,k,l) 

-> a\vj',k,l) |0) |0) |«j'+i,fc,i). 

The first transformation is performed by Alice who then sends the auxiliary qubit to Bob. The second 
transformation is performed by Bob, who then sends the auxiliary qubit back to Alice, who performs 
the third transformation. 

Since the algorithm uses O {\fn) steps, and each step is simulated using a constant amount of communi- 
cation, the number of qubits communicated in the disjointness protocol is therefore also O {y/n). ■ 

8 Open Problems 

As discussed in Section l3~Tl a salient open problem raised by this work is to prove relationships among Z-local, 
C-local, and H-local unitary matrices. In particular, can any Z-local or H-local unitary be approximated 
by a product of a small number of C-local unitaries? Also, is it true that Q(f,G) — 9 (Q z (/, G)) = 
6 (Q H (/, G)) for all /, G? 

A second problem is to obtain interesting lower bounds in our model. For example, let G be a yfn x yfn 
grid, and suppose f (X) = 1 if and only if every row of G contains a vertex «j with Xi = 1. Clearly 
Q (/, G) — O (n 3 / 4 ) , and we conjecture that this is optimal. However, we were unable to show any lower 
bound better than Q (y/n). 

Finally, what is the complexity of finding a unique marked vertex on a 2-D square grid? As mentioned 

in Section fOl Ambainis, Kempe, and Rivosh showed that Q^OR/ 1 ),/^ = O (y/nlogn). Can the 
remaining factor of log n be removed? 
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